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Abstract 

Classifiers are often used to detect miscreant activities. We study how an adversary can 
systematically query a classifier to elicit information that allows the adversary to evade 
detection while incurrin g a near-minimal cost o f modifying their intended malfeasance. We 
generalize the theory of iLowd and Meekl ( 20051 ) to the family of convex-inducing classifiers 
that partition input space into two sets one of which is convex. We present query algorithms 
for this family that construct undetected instances of approximately minimal cost using only 
polynomially-many queries in the dimension of the space and in the level of approximation. 
Our results demonstrate that near-optimal evasion can be accomplished without reverse- 
engineering the classifier's decision boundary. We also consider general ip costs and show 
that near-optimal evasion on the family of convex-inducing classifiers is generally efficient 
for both positive and negative convexity for all levels of approximation if p = 1. 

Keywords: Query Algorithms, Evasion, Reverse Engineering, Adversarial Learning 

1. Introduction 

A number of systems and security engineers have proposed the use of machine learning 
techniques to filter or detect miscreant activities in a variety of applications; e.g., spam, 
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intrusion, virus, and fraud detection. All known detection techniques have blind spots: 
classes of miscreant activity that fail to be detected. While learning algorithms allow the 
detection algorithm to adapt over time, real-world constraints on the learner typically allow 
an adversary to programmatically find vulnerabilities. We consider how an adversary can 
systematically discover blind spots by querying a fixed or learning-based detector to find a 
low cost (for some cost function) instance that the detector does not filter. As a motivating 
example, consider a spammer who wishes to minimally modify a spam message so it is not 
classified as a spam (here cost is a measure of how much the spam must be modified). There 
are a variety of domain specific mechanisms an adversary can use to observe the classifier's 
response to a query; e.g., the spam filter of a public email system can be observed by creating 
a dummy account on that system and sending the queries to that account. We assume the 
attacker has access to a membership oracle for the filter. By observing the responses of 
the spam detector, the spammer can search for a modification while using as few queries as 
possible. 



The problem of near-optima l evasion (i.e., f ii iding a low cost negative instance with 



few queries) was first posed by iLowd and Meekl (120051 ). We continue their investigation 



by generalizing their results to the family of convex-inducing classifiers — classifiers that 
partition their instance space into two sets one of which is convex. The family of convex- 
inducing classifiers is a particularly important and natural class to examine, as it includes 
the family of linear classifiers s tudied by Lowd and Meek as well as anomaly detection 
classifiers using bounded PC A ( Lakhina et al.l . 12004 ). anomaly detection algorithms that 



use hyper-sphere boundaries ( Bishopl . |2006| ) . one-class classifiers that predict anomalies 



by thresholding the log-likelihood of a log-concave (or uni-modal) density function, and 
quadratic classifiers of the form x^ Ax -|- b^x -|- c > if A is semidefinite, to name a few. 
Furthermore, the family of convex-inducing classifiers also includes more complicated bodies 
such as the countable intersection of halfspaces, cones, or balls. 



We also show that near-optimal evasion does not re quire reverse eng i neerin g the classi- 
fier's decision boundary, which is the approach taken bv lLowd and Meekl ( 20051 ) for evading 



linear classifiers. Our algorithms for evading convex-inducing classifiers do not require 
fully estimating the classifier 's boundary (which is hard in the general convex case; see 
Rademacher and Govall . |2009| ) or otherwise reverse-engineering the classifier's state. In- 



stead, we directly search for a minimal-cost evading instance. Our algorithms require only 
polynomial-many queries, with one algorithm solving the linear case with better query 
complexity than the previously-published reverse-engineering technique. 



This paper is organized as follows. We overview past work related to near-optimal 
evasion in the remainder of this section. In Section [2] we formalize the near-optimal evasion 
problem, and review Lowd and Meek's definitions and results. We present algorithms for 
evasion that are near-optimal under ii cost in Section [3] and we consider minimizing general 
£p costs in Section m We conclude the paper by discussing future directions for near-optimal 
evasion of classifiers in Section [5l 
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1.1 Related Work 

Lowd and Meekl ( 20051 ) first explored near-optimal evasion, and developed a method that 
reverse-engineered linear classifiers. Our approach generalizes their result and improves 
upon it in three significant ways. 

• We consider a more general family of classifiers: the family of convex-inducing classi- 
fiers that partition the space of instances into two sets one of which is convex. This 
family subsumes the family of linear classifiers considered by Lowd and Meek. 

• Our approach does not fully estimate the cla ssifier's decision boundary (which is gen- 
erally hard; see iRademacher and Govalll2009l ) or reverse-engineer the classifier's state; 
instead, we directly search for an instance that the classifier recognizes as negative 
that is close to the desired attack instance (an evading instance of near- minimal cost). 

• Even though our algorithms find solutions for a more general family of classifiers, our 
algorithms still only use a limited number of queries: they require only a number of 
queries polynomial in the dimension of the instance space. Moreover, our i('-STEP 
MultiLineSearch (Algorithm H]) solves the linear case with fewer queries than the 
previously-published reverse-engineering technique. 



Dalvi et al.l (|2004l ) use a cost-sensitive game theoretic approach to preemptively patch a 



classifier's blind spots ( Dalvi et al.l . l2004l ). They construct a modified classifier designed to 
detect optimally modified instances. This work is complementary to our own; we examine 
optimal evasion strategies while they have studied mechanisms for adapting the classifier. 
In this paper we assume the classifier is not adapting during evasion. 

A n umber of author s have studied evading s equence-based intrusion detector systems 
(IDSs) (JTan et al.l . 120021 : IWagner and Sotd . l2002l ). In exploring mimicry attacks these au- 
thors demonstrated that real IDSs can be fooled by modifying exploits to mimic normal 
behaviors. These authors used offline analysis of the IDSs to construct their modifications; 
by contrast, our modifications are optimized by querying the classifier. 



The field of active learning also studies a form of query-based optimization (jSchohn and Cohn 

2000). While active learning and near-optimal evasion are similar in their exploration of 



querying strategies, the objectives for these two settings are quite different (see Section [23 



2. Problem Setup 

We begin by introducing our notation and our assumptions. First, we assume that instances 
are represented in a feature space X which is D-dimensional Euclidean spacqj X = M^. 
Each component of an instance x G Af is a feature which we denote as Xd- We denote each 
coordinate vector of the form (0, . . . , 1, . . . , 0) with a 1 only at the d^^ feature as S^- We 
assume that the feature space representation is known to the adversary and there are no 
restrictions on the adversary's queries; i.e., any point in feature space X can be queried by 
the adversary. These assumptions may not be true in every real-world setting, but they 



1. Lowd and Meek also consider integer and Boolean- valued instance spaces and derive results for several 
classes of Boolean- valued learners. 
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allow us to investigate strategies taken by a worst-case adversary. We revisit this assumption 
in Section [5l 

We further assume the target classifier / belongs to a family of classifiers T . Any 
classifier / G J^ is a mapping from feature space X to its response space 3^; i.e., / : ^ — )• y . 
We assume the adversary's attack will be against a fixed / so the learning method and 
the training data used to select / are irrelevant. We assume the adversary does not know 
/ but knows its family T . We also restrict our attention to binary classifiers and use 

3^ = {'-','+'}• 

We assume f G J-' is deterministic and so it partitions X into 2 sets — the positive class 
X^ = {x £ X \ f (x) = '+'} and the negative class X^ = {x £ X \ f (x) = '-'}. We take 
the negative set to be normal instances. We assume that the adversary is aware of at least 
one instance in each class, x~ G XT and x G Xf~, and can observe / (x) for any x by issuing 
a membership query (see Section ?? for a more detailed discussion of this assumption). 

2.1 Adversarial Cost 

We assume the adversary has a notion of utility over the instance space which we quantify 

with a cost function A : X ^ M.^^; e.g., for a spammer this could be edit distance on 

email messages. The adversary wishes to optimize A over the negative class, X7; e.g., the 

spammer wants to send spam that will be classified as normal email (' — ') rather than as 

spam ('+'). We assume this cost function is a distance to some instance x G Xt that is 

most desirable to the adversary. We focus on the general class of weighted ip {0 < p < oo) 

cost functions: 

/ D \ Vp 

where < q < oo is the relative cost the adversary associates with the d}^ feature. We 
also consider the cases when some features have Q = (adversary doesn't care about the 
(f^ feature) or q = oo (adversary requires the d}^ feature to match x^). Weighted £i costs 
are particularly appropriate for many adversarial problems since costs are assessed based 
on the degree to which a feature is altered and the adversary typically is interested in some 
features more than others. Unless stated otherwise, we take "£i cost" to mean a weighted ii 
cost in the sequel. The ^i-norm is a natural measure of edit distance for email spam, while 
larger weights can model tokens that are more costly to remove {e.g., a payload URL). As 
with Lowd and Meek, we focus primarily on ii costs in Section [3] before exploring general 
ip costs in Section HI We use B^ (A) = {x G A' | ^ (x) < C} to denote the cost-bah (or 
sublevel set) with cost no more than C. For instance, B (Ai) is the set of instances that 
do not exceed an £i cost o f C from the target x"^. 

Lowd and Meekl ( 20051 ) define minimal adversarial cost (MAC) of a classifier / to be 



the value 



MAC{f,A) = inf [^(x)] 



xeA7 



i.e., the greatest lower bound on the cost obtained by any negative instance. They further 
define a data point to be an e-approximate instance of minimal adversarial cost (e-IMAC) 
if it is a negative instance with a cost no more than a factor (1 + e) of the MAC; i.e., every 
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e-IMAC is a member of the seel 

e-IMAC if , A) ^ ix e Xf A{x)<{l + e)-MAC{f,A)\ . (2) 

The adversary's goal is to find an e-IMAC efficiently, while issuing as few queries as possible. 

2.2 Search Terminology 

The notion of near-optimality introduced in Eq. ([2]) is that of multiplicative optimality; i.e., 
an €-IMAC must have a cost within a factor of (1 + e) of the MAC. However, the results 
of this paper can also be immediately adopted for additive optimality in which we seek 
instances with cost no more than ry > greater than the MAC. To differentiate between 
these notions of optimality, we will use the notation e-IMAC^*' to refer to the set in Eq. ([2]) 
and define an analagous set r]-IMAC^~^' for addative optimality as 

r]-IMAC'^+'>{f,A)^ixeXf ^ (x) < r? + MAC (/, ^) } . (3) 

We use the terms e-IMAC^*'^ and rj-IMAC^'^^ to refer both to the sets defined in Eq. ^ 
and dS]) as well as the members of them — the usage will be clear from the context. 

Either notion of optimality allows us to efficiently use bounds on the MAC to find 
an e-IMAC^*> or an rj-IMAC^'^' . Suppose there is a negative instance, x, with cost C~ 
and all instances with cost no more than C^ are positive; i.e., C^ is an upper bound 
and C+ is a lower bound on the MAC: C+ < MAC{f,A) < C^. Then the negative 
instance x is e-multiplicatively optimal if Cq /Cq < (1 + e) whereas it is ry-additively 
optimal if Cq — Cq < r]. In the sequel, we will consider algorithms that can achieve either 
additive or multiplicative optimality. These algorithms employ binary search strategies to 
iteratively reduce the gap between any C~ and C"*". Namely, if we can determine whether 
an intermediate cost establishes a new upper or lower bound on MAC, then our binary 
search strategies can iteratively reduce the t^^ gap between Cf and C^ . We now provide 
common terminology for the binary search and in Section [3] we use convexity to establish a 
new bound at each iteration. 



Lemma 1 If an algorithm, can provide bounds C'^ < MAC (/, A) < C^ , then this algorithm 

,c- 



has achieved (1) (C — C^)-additive optimality and (2) {-^ — \) -multiplicative optimality. 



In the t^^ iteration of an additive binary search, the additive gap between the t^^ bounds 
is given by G^ = C^ — C^' with Gq defined accordingly by the initial bounds Cq and 
Cq . The search uses a proposal step of Ct = (Cj^ + C^)/2, a stopping criterion of Gj <r} 
and achieves ry-additive optimality in 



l(+) 

■^r) 



log2 






(4) 



steps. Binary search has the best worst-case query complexity for achieving //-additive 
optimality. 



2. We use 'e-/Mj4C" to refer both to this set and its members. The meaning wiU be clear from the context. 
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Binary search can also be used for multiplicative optimality by searching in exponential 
space. By rewriting our upper and lower bounds as C~ = 2" and C"*" = 2^, the multiplicative 
optimality condition becomes a — b < log2(l + e), an additive optimality condition. Thus, 
binary search on the exponent achieves e-multiplicative optimality and does so with the 
fewest queries. The multiplicative gap of the t*^ iteration is Gl* = C^ /C^ with Gq 



defined accordingly by the initial bounds Cq and Gq . The t^^ query is Gt = \/ G^ ■ G^ 
the stopping criterion is GJ < 1 + e and achieves e-multiplicative optimality in 



Li*) 



log2 



logs (4*^ 
log2(l + e) 



(5) 



steps. Multiplicative optimality only makes sense when both Cq" and Cq are strictly posi- 
tive. 

Binary searches for additive and multiplicative optimality difi^er in their proposal step 
and their stopping criterion. For additive optimality, the proposal is the arithmetic mean 
Gt = (Cj~ -|- G^)/2 and search stops when G^ < rj, whereas for multiplicative optimality, 

the proposal is the geometric mean Gt = \ G^ ■ G^' and search stops when Gj* < 1 + e. 



For the remainder of this paper, we will address e-multiplicative optimality for an e-IMAC 

{*) {*) 

(except where explicitly noted) and define L^ = L\ and Gt = Gl . Nonetheless, our 

algorithms are immediately adapted to additive optimality by simply changing the proposal 
step, stopping condition, and the definitions of Le and Gt- 

2.3 Near-Optimal Evasion 

Lowd and Meekl (120051 ) introduce the concept of adversarial classifier reverse engineering 



(ACRE) learnability to quantify the difficulty of finding an e-IMAC instance for a particular 
family of classifiers J-', and a family of adversarial costs A. Using our notation, their 
definition of ACRE e-learnable is 

A set of classifiers J- is ACRE e-learnable under a set of cost functions A if 
an algorithm exists such that for all / S J-" and ^ G ^, it can find a x G 
e-IMAC {f, A) using only polynomially many membership queries in D, the 
encoded size of/, and the encoded size of x+ and x^. 

In generalizing their result, we slightly alter their definition of query complexity. First, 
to quantify query complexity we only use the dimension D and the number of steps LI 
required by a univariate binary search to narrow the gap between initial bounds Gq and Cq 
to less than (1 + e)o Second, we assume the adversary only has two initial points x~ G X7 
and X G Xt (the original setting required a third x+ G Xt): we restrict our setting to 



3. Using the encoded sizes of/, x"*", and x~ in defining e-IMAC searchable is problematic. For our purposes, 
it is clear that the encoded size of both x+ and x~ is D so it is unnecessary to include additional terms 
for their size. Further we allow for families of non-parametric classifiers for which the notion of encoding 
size is ill-defined but is also unnecessary for the algorithms we present. In extending beyond linear and 
parametric family of classifiers, it is not straightforward to define the encoding size of our classifier / . 
One could use notions such as the VC-dimension of T or its covering number (lAnthonv and Bartlettl . 



Query Strategies for Evading Convex-Inducing Classifiers 



the case of x G Xt , yielding simpler search procedures Q Finally, our algorithms do not 
reverse engineer the decision boundary, so "ACRE" would be a misnomer here. Instead we 
refer to the overall problem as Near- Optimal Evasion and replace ACRE e-learnable with 
the following definition of e-IMAC searchable. 

A family of classifiers J- is e-IMAC searchable under a family of cost functions 
A if for all f ^ F and A ^ A, there is an algorithm that finds x G e-IMAC (/, A) 
using polynomially many membership queries in D and L^. We will refer to such 
an algorithm as efficient. 

Unlike Lowd and Meek's approach, our algorithms construct queries to provably find 
an e-IMAC without reverse engineering the classifier's decision boundary. Efficient query- 
based reverse engineering for / G J-" is sufficient for minimizing A over the estimated negative 
space. However, generally reverse engineering [active learning) is an expensive approach for 
near-optimal evasion, requiring query complexity that is expo i ientia l in the feature space 
dimension for general convex classes ( Rademacher and Goyai l2009l ). while finding an e- 



IMAC need not be — the requirements for finding an e-IMAC differ significantly from the 
objectives of reverse engineering approaches such as active learning. Both approaches use 
queries to reduce the size of version space F <Z F, the set of classifiers consistent with 
the adversary's membership queries. However reverse engineering approaches minimize the 
expected number of disagreements between members of J-. In contrast, to find an e-IMAC, 
we only need to provide a single instance x' G e-IMAC {f, A) for all / G J^, while leaving 
the classifier largely unspecified; i.e., 

f] e-IMAC if. A) ^di . 

This objective allows the classifier to be unspecified in much of X. We present algorithms 
for e-IMAC search on a family of classifiers that generally cannot be efficiently reverse 
engineered — the queries we construct necessarily elicit an e-IMAC only; the classifier itself 
will be underspecified in large regions of X so our techniques do not reverse engineer the 
classifier. 

2.4 Multiplicative vs. Additive Optimality 

Additive and multiplicative optimality are intrinsically related by the fact that the opti- 
mality condition for multiplicative optimality C^ /C^ < 1 + e can be rewritten as additive 
optimality condition log2 C^" — log2 C^^ < log2(l + e). From this equilence we can take 
r/ = log2(l + e) and use the additive optimality criterion on the logarithm of the cost. 
However, this equivalence also leads to two differnces between these notions of optimality. 



Il999f ) but it is unclear why size of the classifier is important in quantifying the complexity of e-IMAC 
search. Moreover, as we demonstrate in this paper, there are non-parametric families of classifiers for 
which e-IMAC search is polynomial in D alone. 

However, as is apparent in the algorithms we demonstrate, using x"*" = x^ makes the attacker less covert 
since it is significantly easier to infer the attacker's intentions based on their queries. (Covertness is not 
an explicit goal in e-IMAC search but it would be a requirement of many real- world attackers.) However, 
since our goal is not to design real attacks but rather analyze the best possible attack so as to understand 
our classifier's vulnerabilities, covertness can be ignored. 
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First, multiplicative optimality only makes sense when Cq is strictly positive (we will 
need this assumption for our algorithms) whereas additive optimality can still be achieved 
if Cq = 0. In this special case, x is on the boundary of XT' and there is no e-IMAC^*' 
for any e > 0. Practically speaking though, this is a minor hinderance — as we demonstrate 
in Section [3.1.31 there is an algorithm that can efficiently establish any lower bound Cq if 
such a lower bound exists. 

Second, the additive optimality criterion is not scale invariant {i.e., any instance x^ that 
satisfies the optimality criterion for cost A also satisfies it for A' (x) = s-A (x) for any s > 0) 
whereas multiplictative optimality is scale invariant. Additive optimility is, however, shift 
invariant {i.e., any instance x' that satisfies the optimality criterion for cost A also satisfies 
it for A' (x) = s + A (x) for any s > 0) whereas multiplicative optimality is not. Scale 
invariance is typically more salient because if the cost function is also scale invariant (all 
proper norms are) then the optimality condition is invariant to a rescaling of the underlying 
feature space; e.g., a change in units for all features. Thus, multiplicative optimality is a 
unitless notion of optimality whereas additive optimality is not. The following result is a 
consequence of additive optimality 's lack of scale invariance. 

Theorem 2 If for some hypothesis space J-', cost function A, and any initial bounds < 
Cq < Cq on the MAC (/, A) for some / G J-", there exists some e > such that no efficient 
query-based algorithm can find an e-IMAC^*> for any < e < e, then there is no efficient 
query-based algorithm that can find a rj-IMAC^^' for any < r] < e ■ Cq . 

Proof We will proceed by contraposition. If there is an efficient query-based algo- 
rithm that can find a x S r]-IMAC^~^> for some < r/ < e • Cq , then, by definition of 
ri-IMAC''+\ A{yi) < r] + MAC {f , A). Taking r] = e ■ MAC {f , A) io^ some e > 0, we have 
equivalently achieved ^ (x) < (1 + e)MAC {f , A); i.e., x G e-IMAC^*\ Moreover, since 
MAC {f. A) <Cq , this efficient algorithm is able to find a e-IMAC^*'> for some e < e. ■ 



Corollary 3 If for some hypothesis space T, cost function A, there exists some e > such 
that no efficient query-based algorithm can find an e-IMAC^*' for any < e < e, then there 
is no efficient query-based algorithm that can find a r]-IMAC^~^' for any rj. 

Proof This follows from Theorem [5] since Cq may be arbitrarily large and e > 0. ■ 

This corollary demonstrates that the lack of scale invariance in the additive optimality 
condition allows for the feature space to be arbitrarily rescaled until any fixed level of 
additive optimality can no longer be achieved; i.e., the units of the cost determine whether 
a particular level of additive accuracy can be achieved whereas multiplicative costs are 
unitless. 

3. Evasion of Convex Classes 

We generalize e-IMAC searchability to the family of convex-inducing classifiers J^^°^^^^ that 
partition the feature space X into a positive and negative class, one of which is convex. The 
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Figure 1: Geometry of convex sets and ii balls, (a) If the positive set X^ is convex, finding 
an ii ball contained within ^^establishes a lower bound on the cost, otherwise at 
least one of the £i ball's corners witnesses an upper bound. |(b)| If the negative set 
Xr is convex, we can establish upper and lower bounds on the cost by determining 
whether or not an ii ball intersects with X7 , but this intersection need not include 



any corner of the ball. 



■/ 



convex-inducing classifiers include the lin ear classifiers studied by iLowd and Meekl (J2005l ) , 
anomaly de tectors using bounded PCA (JLakhina et al.l . 12004 ) and that use hyper-sphere 
boundaries ( Bishopl . l2006l ). one-class classifiers that predict anomalies by thresholding the 
log-likelihood of a log-concave (or uni-modal) density function, and quadratic classifiers of 
the form x^Ax -|- b^x + c > if A is semidefinite. The convex-inducing classifiers also 
include complicated bodies such as any intersections of a countable number of halfspaces, 
cones, or balls. 

Restricting J^ to be the family of convex- inducing classifiers simplifies e-IMAC search. 
When the negative class X7 is convex, the problem reduces to minimizing a (convex) 
function A constrained to a convex set — if XT were known to the adversary, this simply 
corresponds to solving a convex program. When the positive class Xt is convex, however, 
our task is to minimize the (convex) function A outside of a convex set; this is generally a 
hard problem ( cf. Section 14.1.41 where we show that minimizing £2 cost can require expo- 
nential query complexity). Nonetheless for certain cost functions ^, it is easy to determine 
whether a particular cost ball B (A) is completely contained within a convex set. This 
leads to efficient approximation algorithms. 

We construct efficient algorithms for query-based optimization of the ii cost of Eq. ([T]) 
for the convex-inducing classifiers. There appears to be an asymmetry depending on whether 
the positive or negative class is convex as illustrated in Figured! When the positive set is 
convex, determining whether an £1 ball B {A^^ ) C Xt only requires querying the vertices 
of the ball as depicted in Figure [^a)] When the negative set is convex, determining whether 
OTnotB^{A^^^)nXf 



is non-trivial since the intersection need not occur at a vertex as 
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depicted in Figure [JbU We present an efficient algorithm for the optimizing a £i cost when 
Xt is convex and a polynomial random algorithm for optimizing any convex cost when X7 
is convex. 

The algorithms we present achieve multiplicative optimality via binary search. We use 
Eq. dS]) to define L^ as the number of phases required by our binary search to reduce the 
multiplicative gap to less than 1 + e. We also use C^ = A^ (x^) as an initial upper bound 
on the MAC and assume there is some C^ > that lower bounds the MAC (i.e., x is in 
the interior of Xt). This condition eliminates the case where x is on the boundary of Xf 
where MAC {f , A) = and e-IMAC (f , A) = — in this degenerate case, no algorithm can 
find an e-IMAC since there are negative instances arbitrarily close to x . 

3.1 e-IMAC Search for a Convex A'+ 

Solving the e-IMAC Search problem when XT' is hard in the general case of convex cost 
A(-). We demonstrate algorithms for the ii cost of Eq. ([1]) that solve the problem as a 
binary search. Namely, given initial costs Cq and Cq that bound the MAC, our algorithm 
can efficiently determine whether B^' (Ai) C X^ for any intermediate cost C^ < Ct < C^ . 
If the li ball is contained in Xf, then Ct becomes the new lower bound C^tj_^. Otherwise Ct 
becomes the new upper bound C^j^^. Since our objective Eq. ([2]) is to obtain multiplicative 

optimality, our steps will be Ct = \J C^ ■ C^ . We now explain how we exploit the properties 

of the li ball and convexity of Xf to efficiently determine whether B^ {Ai) C Xf for any C . 
We also discuss practical aspects of our algorithm and extensions to other (.p cost functions. 
The existence of an efficient query algorithm relies on three facts: (1) x G Xf; (2) 
every £i cost C-ball centered at x intersects with X7 only if at least one of its vertices 
is in X7] and (3) C-balls of li costs only have 2 • D vertices. The vertices of the ^i ball 
B'-' {Ai) are axis-aligned instances differing from x in exactly one feature {e.g., the d*^ 
feature) and can be expressed in the form 

x^ ± -5d , (6) 

Cd 



c_ 
on the d* feature). We now formalize the second fact as follows. 



which belongs to the C-ball of our (.i cost (the coefficient ^ normalizes for the weight Cd 



Lemma 4 For all C > 0, if there exists some x G Af r that achieves a cost of C = A\' (x), 
then there is some feature d such that a vertex of the form of Eq. ([6]) is in X7 (and also 
achieves cost C by Eq. [7]j. 

Proof Suppose not; then there is some x G X7 such that ^4^*^ (x) = C and x has M > 2 
features that differ from x (if x only differs in 1 feature it would be of the form of Eq. [6]). 
Let {di, . . . , ^m} be the differing features and let 5^- = sign (x^. — x^) be the sign of the 
difference between x and x along the dj-th feature. For each di, let e^;- = x + — -bd^- Sd^ 
be a vertex of the form of Eq. ([6]) which has a cost C (from Eq. [1]). The M vertices e^. 
form an M-dimensional equi-cost simplex of cost C on which x lies; i.e., x = X]i=i ^i^di 
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(a) 




Figure 2: The geometry of search, (a) Weighted £i balls are centered around the target x"' 



and have 2^ vertices; 13.11 Search directions in multi-line search radiate from x 
to probe specific costs; I3.1l ln general, we leverage convexity of the cost function 
when searching to evade. By probing all search directions at a specific cost, the 
convex hull of the positive queries bounds the ii cost ball contained within it. 



for some < a, < 1. If all ed; G XT' , then the convexity of Xt implies that all points in 



/ 



their simplex are in XT' and so x G XT' which violates our premise. Thus, if any instance 
in XT' achieves cost C, there is always a vertex of the form Eq. ([6]) in XT' that also achieves 
cost C. ■ 



As a consequence, if all such vertices of any C ball B (Ai) are positive, then all x with 
A\ X < C are positive thus establishing C as a lower bound on the MAC. Conversely, if 
any of the vertices of B (Ai) are negative, then C is an upper bound on MAC. Thus, by 
simultaneously querying all 2 • Z) equi-cost vertices of B (Ai), we either establish C as a 
new lower or upper bound on the MAC. By performing a binary search on C we iteratively 
halve the multiplicative gap between our bounds until it is within a factor of 1 + e. This 
yields an e-IMAC of the form of Eq. ^. 

A general form of this multiline search procedure is presented as Algorithm [1] and de- 
picted in Figure [2j MultiLineSearch simultaneously searches along the directions in a 
set W of search directions that radiate from their origin at x and that are unit vectors 
for their cost; i.e., A (w) = 1 for any w E W. (We transform a given set of non-normalized 
search vectors {v} into unit search vectors by simply applying a normalization constant 
of ^(v)~^ to each vector.) At each step of MultiLineSearch, at most |>V| queries are 
issued in order to construct a bounding shell (i.e., the convex hull of these queries will 
either form an upper or lower bound on the MAC) to determine whether B'-' (A) C X^. 
Once a negative instance is found at cost C, we cease further queries at cost C since a 
single negative instance is sufficient to establish a lower bound. We call this policy lazy 
querying^ Further, when an upper bound is established for a cost C (a negative vertex is 



5. We could continue querying at any distance B where there is a known negative instance as it may allow 
us to prune other search directions quickly. However, once the classifier reveals a negative instance at 
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found), our algorithm prunes all directions that were positive at cost C. This pruning is 
sound; by the convexity assumption these pruned directions are positive for all costs less 
than the new upper bound C on the MAC. Finally, by performing a binary search on the 
cost, MultiLineSearch finds a e-IMAC with no more than \W\ ■ L^ queries but at least 
|W| + L^ queries. Thus, this algorithm is O (IWI • L^) for ^i costs. 

It is worth noting that, in its present form, MultiLineSearch has two implicit as- 
sumptions. First, we assume all search directions radiate from a common origin, x , and 
A (x ) = 0. Without this assumption, the ray-constrained cost function A (x + s ■ w) is 
still convex in s > but not necessarily monotonic as required for binary search. Second, 
we assume the cost function ^4 is a positive homogeneous function along an ray from x , 
i.e., A (x^ + s • w) = |s| • ^ (x^ + w). This assumption allows MultiLineSearch to scale 
its unit search vectors to achieve the same scaling of their cost. Although the algorithm 
could be adapted to eliminate these assumptions, the cost functions in Eq. ([1]) satisfy both 
assumptions since they are norms centered at x . 

Algorithm [2] uses MultiLineSearch for li costs by making W be the vertices of the 
unit-cost li ball centered at x . In this case, the search issues at most 2 • D queries to 
determine whether B^ {Ai) C X^' and so Algorithm [2] is O (L^ • D). However, MultiLine- 
Search does not rely on its directions being vertices of the £i ball although those vertices 
are sufficient to span the ii ball. Generally, MultiLineSearch is agnostic to the configu- 
ration of its search directions and can be adapted for any set of directions that can provide 
a bound on the cost using the convexity of Xf . However, as we show in Section [H the 
number of search directions required to bound an i.p for p > 1 can be exponential in D. 

3.1.1 i^-STEP Multi-Line Search 

Here we present a variant of the multi-line search algorithm that better exploits pruning 
to reduce the query complexity of Algorithm [1] — we call this variant iC-STEP MultiLi- 
neSearch. The MultiLineSearch algorithm is 2 • |W| simultaneous binary searches 
(breadth-first). This strategy prunes directions most effectively when the convex body is 
assymetrically elongated relative to x^ but fails to prune for symmetrically rounded bodies. 
Instead we could search each direction sequentially (depth-first) and still obtain a worst case 
of O (Le • D) queries. In contrast, this strategy reduces queries used to shrink the cost gap 
on symmetrically rounded bodies but is unable to do so for assymetrically elongated bodies. 
We therefore propose an algorithm that mixes these strategies. 

At each phase, the i^-STEP MultiLineSearch (Algorithm [3]) chooses a single direction 
e and queries it for K steps to generate candidate bounds B~ and B^ on the MAC. The 
algorithm makes substantial progress towards reducing Gt without querying other directions 
(depth-first). It then iteratively queries all remaining directions at the candidate lower 
bound B^ (breadth-first). Again we use lazy querying and stop as soon as a negative 
instance is found since B^ is then no longer a viable lower bound. In this case, although 
the candidate bound is invalidated, we can still prune all directions that were positive at 



distance B~ , the classifier would be foolish to subsequently reveal that another direction has a '+' at 
the same distance since it freely allows the adversary to prune a search direction. Hence, a malicious 
classifier will always respond with ' — ' for any cost where a negative instance has already been revealed. 
Thus, our algorithm uses lazy querying and only queries at costs below our upper bound C^ on the 
MAC. 
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Algorithm 1 MuLTi-LiNE Search 
ML5(W,x^,x-,Co+,Co-,e) 



X 



X 



while CfT /C^ > 1 + e do begin 

ct ^ ^ct ■ cr 

for all e G W do begin 

Query: / * ^ / (x^ + ^ • e) 
if /* = ' — ' then begin 
X* ^ x^ + Ct • e 
Prune i from W if // = ' + ' 
break for-loop 
end if 
end for 



Ct+i ^ Ct and C,- 1 



c; 



if Ve e W /* = '+' then C, 



else C^^-^ 
t ^ t + 1 
end w^hile 
return: x* 



C, 



i+l 



Ct 



Algorithm 2 Convex Xf Set 
Search 

ConvexSearch (W, x"^, x^, e, C+) 
C- ^vl(x-) 

for alH G 1 . . . D do begin 

^'^7- -Si 

W^>Vu{±e*} 
end for 
return: ML5 (W,x^,x-, C+, C", e) 



Algorithm 3 Linear Xf Set 
Search 

LinearSearch (W, x , x~, e, C~^) 

C- ^vl(x-) 

W^0 

for all i G 1 ... D do begin 

"--1 

6j ^ sign (x~ — x^) 

if 6, = then W ^ W U {6ie*} 

else W^ Wu{±e*} 
end for 
return: ML5 (>V,x^,x~, C+, C", e) 



B^ . Thus, in every iteration, either the gap is decreased or at least one search direction is 
pruned. We show that for K = [vT^] , the algorithm achieves a delicate balance between 
breadth-first and depth-first approaches to attain a better worst-case complexity than either. 

Theorem 5 Algorithm^ will find an e-IMAC with at most O i^L^ + vT^lWl) queries when 

The proof of this theorem appears in Appendix El As a consequence of Theorem \5\ 
finding a e-IMAC with Algorithm [5] for a ii cost requires O (L^ + \/TZD) queries. Further, 
both Algorithms [2] and El can incorporate K-step MultiLineSearch directly by replacing 
their function call to MLS to KLMS and using K = \y/L^ . 

3.1.2 Lower Bound 

Here we find lower bounds on the number of queries required by any algorithm to find an 
e-IMAC when Xt is convex for any convex cost function {e.g., Eq. [T]for p > 1). Below we 
present two theorems, one for both additive and multiplicative optimality Notably, since 
an e-IMAC uses multiplicative optimality, we incorporate a lower bound C^ > on the 
MAC into our statement. 
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Algorithm 4 K-Step Multi-line Search 

KMLS (W, x^ , X- , C+ ,CQ,e,K) 
X -^ X 

while C^ /C^ > 1 + e do begin 
Choose a direction e G W 
B+ ^ Ct 
B- ^ C^ 

for K steps do begin 
B^ \/B+ ■ B- 
Query: /, ^ / (x^ + 5 • e) 
if /g = '+' then B+ ^ B 
else B^ <— B and x* ^ x^ + i? • e 
end for 

for all i 7^ e G W do begin 
Query: ff ^ / (x^ + {B+) ■ i) 
if ff = ' — ' then begin 
X* ^ x^ + {B+) ■ i 
Prune k from W if /^ = ' + ' 
break for-loop 
end if 
end for 



if Vi G W /^ = '+' then (7^+ ^ ^ B- 



else C^j^^ ^ B^ 

t^t + 1 
end w^hile 
return: x* 



Theorem 6 For any D > 0, any positive convex function A : M.^ — )■ R^", any initial bounds 
< Cq < Cq on the MAC, and < ?? < Cq — Cq , all algorithms must submit at least 
niax{Z), Lj) } membership queries in the worst case to be rj-additive optimal on J^f^o^™^' + . 

Theorem 7 For any D > 0, any positive convex function A : R — )• M"*", any initial 
bounds < Cq*" < Cq on the MAC, and < e < — qr — 1, all algorithms must submit at 

least niax{D,Le } membership queries in the worst case to be e-multiplicatively optimal on 

TTConvex, '+ ' 

The proof of both of these theorems is in Appendix [Bj In these theorems, we restrict 
r] and e to the intervals {0-,Cq — Cq) and (0, -^ — 1 j respectively. In fact, outside of 
these intervals the query strategies are trivial. For either ?/ = or e = no approximation 
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algorithm will terminate and for rj > Cq — Cq or e > -^ — 1, x is an IMAC, so no queries 

'-'0 

are required. 

Theorem [6] and [7] show that one needs that //-additive and e-multiplicative optimality 
require J7(Lr^ -|- D) and ^{L^ -\- D) queries respectively. Thus, we see that our ii'-STEP 
MultiLineSearch algorithm (Algorithm S]) has close to the optimal query complexity for 
^i-costs with its 0{Le + \/l-ilD) queries. These results also hold for arbitrary ip (p > 1) costs 
but we show lower bounds in Section |4] for p > 1 that substantially exceed these results. 

3.1.3 Special Cases 

Here we present a number of special cases that require minor modifications to Algorithms [T] 
and m primarily as preprocessing steps. 

Revisiting Linear Classifiers Lowd and Meek originally developed a method for reverse 
engineering linear classifiers for a ii cost. First their method isolates a sequence of points 
from x~ to x that cross the classifier's boundary and then it estimates the hyperplane's 
parameters using D line searches. However, as a consequence of the ability to efficiently 
minimize our objective when X^ is convex, we immediately have an alternative method for 
linear classifiers (i.e., half-spaces). In fact, for this special case, as many as half of the search 
directions can be eliminated using the initial orientation of the hyperplane separating x 
and x~ . Intuitively, the minimizer in the negative halfspace can only occur along one of the 
axes of the orthants that contain x~ . This algorithm is presented as Algorithm [3l More- 
over, because linear classifiers are a special case of convex-inducing classifiers, our i^-STEP 
MultiLineSearch algorithm improves on the reverse-engineering technique's O {L^ ■ D) 
queries and applies to a broader family. 

Extending MultiLineSearch algorithms to c^ = cx) or c^ = In Algorthms [2] and [3l 
we reweighted the d}"^ axis-aligned directions by a factor — to make unit cost vectors but 
implictly assuming Cd G (0, oo). The case where c^ = oo {e.g., immutable features) is 
dealt with simply removing those features from the set of search directions W used in the 
MultiLineSearch. In the case when q = {e.g., useless features), MultiLineSearch- 
like algorithms no longer ensure near-optimality because they implicitly assume that cost 
balls are bounded sets. If Cd = 0, B^ {A) is no longer bounded and a 0-cost could be achieved 
if X7 anywhere intersects the subspace spanned by the 0-cost features — this makes near- 
optimality unachievable unless a negative 0-cost instance can be found. In the worst case, 
such an instance could be arbitrarily far in any direction within the 0-cost subspace making 
search for such an instance intractable. Nonetheless, one possible search strategy is to 
assign all 0-cost features a non-zero weight that decays quickly toward {e.g., Cd = 2~t in 
the t*^ iteration) as we repeatly rerun an MultiLineSearch on the altered objective for 
T iterations. We will either find a negative instance that only alters 0-cost features (and 
hence is a 0-IMAC), or we will terminate assuming no such instance exists. This algorithm 
does not ensure near-optimality but may find a suitable instance with only T runs of a 
MultiLineSearch. 

Lack of an Initial Lower Bound Thus far, to find a e-IMAC our algorithms have 
searched between initial bounds Cq and Cq , but, in general, Cq may not be known to 
a real-world adversary. We now present an algorithm we call SpiralSearch that can 
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efficiently establish a lower bound on the MAC if one exists. This algorithm performs 
a halving search on the exponent along a single direction to find a positive example, then 
queries the remaining directions at that cost. Either the lower bound is verified or directions 
that were positive can be pruned for the remainder of the search. 

Algorithm 5 Spiral Search 



spiral (W, x"^, x , Cg , e) 
i ^ and V ^ 
repeat 

Choose a direction e € W 

Remove e from W and V -^ V U {e} 

Query: f^ ^ f (^^^ + (C,)2-^' 
if /i = ' — ' then begin 

W ^ W U {e} and V ^ 

t^t + 1 
end if 
until W = 



return: {V,C+,Cq 



At the t iteration of SpiralSearch a direction is selected and queried at the current 
lower bound of {Cq)2^'^ . If the query is positive, that direction is added to the set V of 
directions consistent with the lower bound. Otherwise, all directions in V are discarded 
and the lower bound is lowered with an exponentially decreasing exponent. Thus, given 
that some lower bound Cq > does exist, one will be found in O (L^ + D) queries and this 
algorithm can be used as a precursor to any of the previous searcheqj- Further, the search 
directions pruned by SprialSearch are also invalid for the subsequent MultiLineSearch 
so the set V returned by SprialSearch will be used as the set W for the subsequent search. 

Lack of a Negative Example Our algorithms can also naturally be adapted to the 
case when the adversary has no negative example x" . This is accomplished by querying ii 
balls of doubly exponentially increasing cost until a negative instance is found. During the 
t*^ iteration, we probe along every search direction at a cost {Cq)2'^ ; either all probes are 
positive (and we have a new lower bound) or at least one is negative and we can terminate the 
search. Once a negative example is located (having probed for T iterations), we must have 

(Co+)22^~' <MAC (/, A) < (C+)22^; thus, T = Tlogg logg ^^^^^1 . We can subsequently 

perform MultiLineSearch with Cq = 2^ and Cq = 2^ ; i.e., log2 Go = 2'^~^. This 
precursor step requires at most |W| -T queries to initialize the MultiLineSearch algorithm 
with a gap such that L^ = (T — 1) + log2 ^ fi+e) according to Eq. ([5]). 

If there is neither an initial upper bound or lower bound, we proceed by probing each 
search direction at cost 1 using an additional | {W} \ queries — we will subsequently have 
either an upper or lower bound and can proceed accordingly. 



If no lower bound on the cost exists, no algorithm can find a e-IMAC. As presented, this algorithm 
would not terminate, but in practice the search would be terminated after sufficiently many iterations. 

16 



Query Strategies for Evading Convex-Inducing Classifiers 



Algorithm 6 Intersect Search 

Intersects earch (P°, Q = {x^' G po} , C) 
for all s = 1 ... T do begin 

(1) Generate 2N samples {x-^} ._ 
Choose X from Q 

x^' ^ HitRun{V'~^,Q,:>i.^) 

(2) If any x-', A (x-^') < C terminate the for- 
loop 

(3) Put samples into 2 sets of size N 

(5) Compute T-L^," using Eq. ([8]) 

(6) v ^v'-^nn^^ 

(7) Keep samples in V^ 
Q ^ {x G 5 A X e P''} 
end for 
Return: the found [xjjV^, Q]; or No Intersect 



Algorithm 7 HiT-AND-RuN 

ffiti?Mn(P,{yj},xO) 

for alH = 1 . . . i^ do begin 

(1) Choose a random direction: 

Uj ~ N (0, 1) 

(2) Sample uniformly along v us- 
ing rejection sampling: 

Choose n s.t. x*-i + n-v ^V 
repeat 

u ~ Unif (0, f]) 
x^ 

until X* G T' 
end for 
Return: x^ 



X* -*- + C;J • V 



3.2 e-IMAC Learning for a Convex X 



7 



In this section, we consider minimizing a convex cost function A (we focus on weighted ii 
costs in Eq. [T]) when the feasible set Xr is convex. Any convex function can be efficiently 
minimized wi thin a known convex set (e.g., using the Ellipsoid Method and Interior Point 



methods; see iBoyd and Vandenberghell2004l ). However, in our problem the convex set is 



only accessible via membersh ip queries. We use a randomized polynomial algorithm of 
Bertsimas and Vempalal ( 2004 ) to minimize the cost function A given an initial point x~ G 



Xr . For any fixed cost C* we use their algorithm to determine (with high probability) 

whether XT' intersects with B (A); i.e., whether C* is a new lower or upper bound on 
the MAC. With high probability, we find an e-IMAC in no more than L,. repetitions using 
binary search. We now focus only on weighted ii costs (Eq. [1]) and return to more general 
cases in Section 14.21 



3.2.1 Intersection of Convex Sets 



We now outline Bertsimas and Vempala's query-based procedure for determining whether 
two convex sets (e.g., X7 and B*-^ (^i)) intersect. Their IntersectSearch procedure 
(which we present as Algorithm [6]) is a randomized Ellipsoid method for determining 
whether there is an intersection between two bounded convex sets: V is only accessible 
through membership queries and B provides a separating hyperplane for any point outside 
it. They use efficient query-based approaches to uniformly sample from V to obtain suf- 
ficiently many samples such that cutting V through the centroid of these samples with a 
separating hyperplane from B will significantly reduce the volume of V with high prob- 
ability. Their technique thus constructs a sequence of progressively smaller feasible sets 
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-ps ^ -ps-i yj^^jj either the algorithm finds a point in "P n Q or it is highly unlikely that the 
intersection is non-empty. 

Our problem reduces to finding the intersection between X7 and B (Ai). Though 
X7 may be unbounded, we are minimizing a cost with bounded equi-cost balls, so we can 
instead use the set V^ = X7 fl B^^ (^i) (where R = A (x~) > C*) is a (convex) bounded 

subset of X7 that envelops all of B^ {Ai) and thus the intersection X7 n B^ {Ai) if it 
exists. We also assume that there is some r > such that there is an r-ball contained in 
the convex set X7; i.e., there exists y G X7 such that B^ {Ai;y) C X7 . We now detail 
this IntersectSearch procedure (Algorithm [6l) . 

The backbone of the algorithm is the capability to sample uniformly from an unknown 
but boun ded co i ivex b ody by means of the hit-and-run random walk technique intro- 
duced bv ISmithI ( 19961 ) (Algorithm [7|) . Given an instance x-' G "P*"^, hit-and-run selects 



a random direction v through x-' (we return to the selection of v in Section [3.2.2p . Since 
P**~^ is a bounded convex set, the set ri = {w > | x-' + wv G P"*^^} is a bounded interval 
indexing all feasible points along direction v through x-'. Sampling oj uniformly from 
(using rejection sampling) yields the next step of the random walk; x-' +OJ^r. Under the ap- 
propriate conditions (see Section r3.2.2p . the hit-and-r un random walk generates a sample 
uniformly from the convex body after O* (-D^) stepqj (jLovasz and Vempalal . |2004| ) . 



Randomized Ellipsoid Algorithm: We use hit-and-run to obtain 2A^ samples {x-'} 
from "P*^^ c X7 for a single phase of the randomized ellipsoid algorithm. If any sample x-' 

satisfies Ai (x-^') < C*, then x-' is in the intersection oiX7 and B'~' {Ai) and the procedure is 
complete. Otherwise, we want to significantly reduce the size of P**"^ without excluding any 
of B^ {Ai) so that sampling concentrates toward the intersection (if it exists) — for this we 
need a separating hyperplane for B^ {Ai). For any point y ^ B^ [Ai), the (sub)gradient 
of the £i cost given by 

h^d = Cd sign [yd - xj) , (7) 

and is a separating hyperplane for y and B'-' (Ai). 

To achieve efficiency, we choose a point z G p*~i so that cutting V^^^ through z 
with the hyperplane h^ eliminates a significant fraction of P*~^. To do so, z must be 
centrally located within P*^^. We use the empirical centroid of the half of our samples in 
TZ: z = N~'^ "l^xeTZ^ i^^^ other half we will be used in Section r3.2.2p . We cut p"*~^ with 
the hyperplane h^ through z; i.e., V^ = p"*"^ n liz where T-L^ is the halfspace 



^z= X 



h^ <z 



"h^} . (8) 



As shown by Bertsimas and Vempala, this cut achieves vol (V) < -^vol (V^~^) with high 
probability if A^ = O* (D) and P^~^ is near-isotropic (see Section [3.2.2p . Since the ratio of 
volumes between the initial circumscribing and inscribing balls of the feasible set is (R/r) , 
the algorithm can terminate after T = O {Dlog{R/r)) unsuccessful iterations with a high 
probability that the intersection is empty. 

Because every iteration in Algorithm [H] requires A^ = O* {D) samples, each of which 
need K = O* (-D^) random walk steps, and there are O* (D) iterations, the total number 
of membership queries required by Algorithm [6] is O* {D^)- 

7. O* (•) denotes the standard complexity notation O (■) without logarithmic terms. 
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3.2.2 Sampling from a Queriable Convex Body 

In the randomized Ellipsoid algorithm, random samples are used for two pvirposes: estimat- 
ing the convex body's centroid and maintaining the conditions required for the hit-and-run 
sampler to efficiently generate points uniformly from a sequence of shrinking convex bodies. 
Until this point, we assumed the hit-and-run random walk efficiently produces uniformly 
random samples from any bounded convex body V accessible through membership queries. 
However, if the body is severely elongated, randomly selected directions will rarely align 
with the long axis of the body and our random walk will take small steps (relative to the 
long axis) and mix slowly. For the sampler to mix effectively, we need the convex body 

V to be sufficiently round, or more formally near-isotropic; i.e., for any unit vector v, 
IEx~-p (v^ (x — Ex^p [x])) is bounded between 1/2 and 3/2 of vol (V). 

If the body is not near-isotropic, we must rescale X with an appropriate affine transfor- 
mation T so the resulting body V' is near-isotropic. With sufficiently many samples from 

V we can estimate T as their ei npirical covariance matrix . Inst ead, we rescale X implicitly 
using a technique described by iBertsimas and Vempalal (J2004l ) . We maintain a set Q of 



sufficiently many uniform samples from the body V^ and in the hit-and-run algorithm 
(Algorithm [7]) we sample the direction v based on this set. Intuitively, because the samples 
in Q are distributed uniformly in "P*, the directions we sample based on the points in Q 
implicitly reflect the covariance structure of V^ . This is equivalent to sampling the direction 
V from a normal distribution with zero mean the covariance of V. 

We must ensure Q is a set of sufficiently many samples from V^ after each cut: V^ ^ 
V^~^ nTiz". To do so, we initially resample 2N points from P'^~^ using hit-and-run — half 
of these, TZ, are used to estimate the centroid z^ for the cut and the other half, S, are used 
to repopulate Q after the cut. Because S contains independent uniform samples from P*~^, 
those in V^ after the cut constitute independent uniform samples from V^ (i.e., rejection 
sampling). By choosing A^ sufficiently large, our cut will be sufficiently deep and we will 
have sufficiently many points to resample V^ after the cut. 

Finally, we also need an initial set Q of uniform samples from V^ but, in our problem, 
we only have a single point x~ G X7 . Fortunately, there is an iterative procedure for 
putting the initial convex set V^ into a near-i sotropic position frorn which we obtain Q. 
The RoundingBody algorithm described by iLovasz and Vempalal (J2003l ) uses O* [D'^) 



membership queries to transforms the convex body into a near-isotropic position. We use 
this as a preprocessing step for Algorithms [6] and El that is, given Xr and x~ G X7 we 
make V^ = X7 n^^'^(^i;x^) and then use the RoundingBody algorithm to produce 
an initial uniform sample Q = {x-' G V^^. These sets are then the inputs to our search 
algorithms. 

3.2.3 Optimization over li Balls 

We now revisit the outermost optimization loop (for searching the minimum feasible cost) of 
the algorithm and suggest improvements. First, since x , x~ and Q are the same for every 
iteration of the optimization procedure, we only need to run the RoundingBody procedure 
once as a preprocessing step rather than running it as a preprocessing step every time 
IntersegtSeargh is invoked. The set of samples {x-' G P°| produced by RoundingBody 
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Algorithm 8 Convex Xr Set Search 



SetSearch {V, Q = {x^ G V} , Cq,C+, e) 

X* ^ x~ and t ^ 

while C^ /C^ > 1 + e do begin 



[x*, P', Q!] ^ IntersectSearch (V, Q, C) 
if intersection found then begin 

C7r+i^>l(x*)andC+i^C+ 
V ^V andQ^ Q' 
else 

Cr+i ^ C^ and C+ 1 ^ Ct 

end if 

t ^ t + 1 
end while 
Return: x* 



are sufficient to initialize the IntersectSearch at each stage of the binary search over 
C*. Second, the separating hyperplane h^ given by Eq. ([7]) does not depend on the target 
cost C* but only on x , the common center of all the ii balls. In fact, the separating 
hyperplane at point y is valid for all ^i-balls of cost C < A (y). Further, if C < C*, we have 
B'^ (Ai) c B'^* (Ai). Thus, the final state from a successful caU to IntersectSearch for 
the C*-ball as the starting state for any subsequent call to IntersectSearch for all C < C^. 
These improvements are reflected in our final procedure SetSearch in Algorithm [8] — the 
total number of queries required is also O* {D^)- 

4. General £p Costs 

Here we further extend e-IMAC searchability over the family of convex-inducing classifiers 
to the full family of ip costs for any < p < oo. As we demonstrate in this section, many ip 
costs are not generally e-IMAC searchable for all e > over the family of convex-inducing 
classifiers(i.e., we show that finding an e-IMAC for this family can require exponentially 
many queries in D and e). In fact, only the weighted ii costs are known to have (randomized) 
polynomial query strategies when either the positive or negative set is convex. 

4.1 Convex Positive Set 

Here we explore the ability of MultiLineSearch and K-step MultiLineSearch algo- 
rithms presented in Section 13.11 to find solutions to the near-optimal evasion problem for ip 
cost functions with p ^ 1. Particularly for p > 1 we will be exploring the consequences of 
using the MultiLineSearch algorithms using more search directions than just the 2 • D 
axis-aligned directions. Figure [3] demonstrates how queries can be used to construct upper 
and lower bounds on general ip costs. The following Lemma also summarizes well known 
bounds on general ip costs based on an ii cost. 
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Figure 3: Convex hull for a set of queries and the resulting bounding balls for several ip 
costs. Each row represents a unique set of positive (red '+' points) and negative 
(green ' — ' points) queries and each column shows the implied upper bound (in 
green) and lower bound (in blue) for a different ip cost. In the first row, the body 
is defined by a random set of 7 queries, in the second, the queries are along the 
coordinate axes, and in the third, the queries are around a circle. 



Lemma 8 The largest ip (p > 1) ball enclosed within an i\ ball has a radius (cost) of D p 
and for p = oo the radius is D~^ . 

4.1.1 Bounding ip Balls 

In general, suppose we probe along some set of M unit directions and at some point we 
have at least one negative point supporting an upper bound of Cq and M positive points 
supporting at a cost of Cq. However, the lower bound provided by those M positive points 
is the cost of the largest ip cost ball that fits entirely within their convex hull; let's say this 
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cost is C' < Cq . In order to achieve e-multiplicative optimality, we need 



Expanding this, we need 



|)(§^'^'- 



This allows us to break the problem into two parts. The first factor Cq /Cq is only in terms 
of parameters controlled by the multiline search algorithm whereas the second factor Cq /C^ 
depends only on the shape of the £p ball as it captures how well the ball is approximated by 
the convex hull of the search directions. These two factors separate our task into choosing 
M and L^ sufficiently so that their product is less than 1 + e. First we choose factors a > 
and /3 > so that (1 + a)(l + /3) < 1 + e. Then we chose M so that 

and a parameter e' = a so that multiline search with M directions will achieve 

In doing so, we create a generalized multiline search that is able to achieve e-multiplicative 
optimality. 

For example in the case oi p = 1 , we previously saw that choosing M = 2 ■ D allows us 
to exactly reconstruct the £i ball so that Cq /C^ = 1 (i.e., /3 = 0). Thus we can just make 
a = e and we recover our original multiline search method exactly. 

Objective: Below we present a number of results that deal with cases when /3 > 0. In 

this case, what we want to show is that a ratio of -^f = 1 + /3 can be achieved with a 
polynomial number of search directions when /3 < e; otherwise, (1 + a)(l + /3) > 1 + e. 
Thus, we will be trying to find how many search directions are required for to achieve 

Cn+ 

since this is the highest we can allow this ratio to be. Moreover, since this problem scales 
linearly with Cq we will simply examine the values of C^ that can be achieved for the unit 
cost ball {i.e., w.l.o.g. we make Cq = 1 and rescale). Thus we will be looking at how many 
points are required to achieve: 

Ct>^. (9) 

We will try to show that only polynomially many are required for at least some values of e. 

Lemma 9 // there exists a configuration of M unit search directions with a convex hull 
that yields a bound C^ for the cost function A then multi-line search algorithms can use 

those search directions to achieve e-multiplicative optimality with a query complexity that is 

(*) 
polynomial in M and Ll for any 
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Corollary 10 // there exists a configuration of M unit search directions with a convex 
hull that yields a bound C^ = 1 for the cost function A then multi-line search algorithms 

then multi-line search algorithms can use those search directions to achieve e-multiplicative 

(*) 
optimality with a query complexity that is polynomial in M and L\ for any e > 0. 

As this corollary reaffirms, for p = 1 using the M = 2 ■ D coordinate directions allows 
multi-line search algorithms to achieve e-multiplicative optimality for any e > with a 
query complexity that is polynomial in M and LI ' . 

4.1.2 Multiline Search for p < 1 

A simple result holds here. Namely, since the unit ^i ball bounds any unit ip balls with 
p < 1 we can achieve Cq /C' = 1 using only the 2 • D corners of the hyperoctahedron as 
search directions. Thus we can efficiently search for p < 1 for any value of e > 0. Whether 
or not the Ip {p < 1) cost functions can be efficiently searched with fewer search directions 
is an open question. 

4.1.3 Multiline Search for p > I 

For this case, we can trivially use the ii bound on ip balls as summarized by the following 
corollary: 

Corollary 11 For 1 < p < oo and e € iD~p~ — l,oo 1 any multi-line search algorithm 
can achieve e-multiplicative optimality on Ap using M = 2 ■ D search directions. Similarly 
for p = oo and e £ {D — l,oo) any multi-line search algorithm can achieve e-multiplicative 
optimality on A^o- 

Proof From Lemma [HI the largest co-centered ip ball contained within the unit ii ball has 

i-p 
radius (cost) D p (or D for p = oo). The bounds on e then follows from Lemma [9l ■ 

Unfortunately, this result only applies for a range of e that grows with D, which is 
insufficient for e-IMAC searchability In fact, for some fixed values of e, there is no query- 
based strategy that can bound ip costs using polynomially-many queries in Das the following 
result formalizes. 

Theorem 12 For p > 1, D > 0, any initial bounds < Cq < Cq on the MAC, and 

D 



p-i 



p.« 



0<e<2p — 1 (or < e < 1 for p = ooj, all algorithms must submit at least a. 
membership queries (for some constant Op^^ > Ij in the worst case to be e-multiplicatively 
optimal on J^c°nvex, + j^^ £^ costs. 

The proof of this theorem is in Appendix \C\ A consequence of this theorem is that 
there is no query-based algorithm that can efficiently find an e-IMAC of any ip cost {p > 1) 
for any < e < 2 p (or < e < 1 for p = oo) on the family jr^ionvex, + ^ However, from 
Theorem 1111 and Lemma [9l multiline-search type algorithms efficiently find the e-IMAC of 
any ip cost {p > 1) for any e £ (D p — l,ooJ(orZ) — l<e<ooforp = oo). It is 

generally unclear if efficient algorithms exist for any values of e between these intervals, but 
in the following section we derive a stronger bound for the case of p = 2. 
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4.1.4 Multiline Search for p = 2 

Theorem 13 For any D > 1, any initial bounds < C^ < Cq on the MAC, and < 

e < — qr — 1, all algorithms must submit at least a^ ^ membership queries (where a^ = 
(i-i-e')^^-i ^ ^) ^'^ ^^^ worst case to be e-multiplicatively optimal on J^^onvcx, '+ ' j^^ £^ costs. 

The proof of this result is in Appendix [Dl 

This resuh says that no algorithm can achieve e-multiplicative optimality for I2 costs 
for any fixed e > using only polynomially-many queries in D. However, for a fixed D, the 
bound provided by Theorem [T3] suggests that reasonable approximations may be achievable 
a^ — )■ 1. 

It may appear that Theorem [13] contradicts Corollary 111! However, in Corollarv 1111 onlv 
applies for a range of e that depends on D; i.e., e > \/D — 1. Interestingly, substituting this 
lower bound on e into the bound given by Theorem 113^ we get that the number of required 
queries for e > VD — 1 need only be 

-D-2 D-2 

M > ' (^ + ^) ^' _ ( D ^^ 



(l + e)2-iy \D-l 

which is a monotonically increasing function in D that asymptotes at ^/e ~ 1.64. Thus, 
Theorem [13] and Corollary [11] are in agreement since for e > vD — 1, the former only 
requires that we need at least 2 queries. 

4.2 Convex Negative Set 

Algorithm [8] generalizes immediately to all weighted Ip costs (p > 1) centered at x since 
these costs are convex. For these costs an equivalent separating hyperplane for y can be 
used in place of Eq. (|7]). These are given by the equivalent (sub)-gradients for ip cost-balls: 



p-i 



Kd = Cd sign (yd 




p- 

Ko4 = Cdsign{yd 

By only changing the cost function A and the separating hyperplane h^ used for the half- 
space cut in Algorithms [6] and [U the randomize ellipsoid search can be applied for any 
weighted ip cost Ap ' . 

For more general convex costs A, we still have that the set of all points x with A (x) < C 
(the sublevel set of cost C) is a subset of the sublevel set of cost D for all D > C; thus, the 
separating hyperplanes for the sublevel set at cost D will also be separating hyperplanes for 
the sublevel set at cost C. The SetSearch procedure therefore is applicable for any convex 
cost function A so long as we can compute the separating hyperplanes of any sublevel set 
of A for any point y not in sublevel sel) 



8. The sublevel set of any convex function is a convex set (see iBovd and Vandenberghel [200J) so such a 
separating hyperplane always exists but may not be simple to compute. 
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For non-convex costs A such as weighted ip costs with p < 1, minimizing on a convex 
set Xr is generally a hard problem. However, there may be special cases when minimizing 
such a cost can be accomplished efficiently. 

5. Conclusions and Future Work 

In this paper we study e-IMAC searchability of convex-inducing classifiers. We present 
membership query algorithms that efficiently accomplish e-IMAC search on this family. 
When the positive class is convex we demonstrate very efficient techniques that outperform 
the previous reverse-engineering approaches for linear classifiers. When the negative class 
is convex, we apply a randomized Ellipsoid method to achieve efficient e-IMAC search. 
If the adversary is unaware of which set is convex, they can trivially run both searches 
to discover an e-IMAC with a combined polynomial query complexity. We also show our 
algorithms can be efficiently extended to cope with a number of special circumstances. Most 
importantly, we demonstrate that these algorithms can succeed without reverse engineering 
the classifier. Instead, these algorithms systematically eliminate inconsistent hypotheses 
and progressively concentrate their efforts in an ever-shrinking neighborhood of a MAC 
instance. By doing so, these algorithms only require polynomially-many queries in spite of 
the size of the family of all convex-inducing classifiers. 

We also consider general ip costs and show that j^cons;ex ^g only e-IMAC searchable for 
both positive and negative convexity for any e > if p = 1. For < p < 1, the MuL- 
tiLineSearch algorithms of Section 13.11 achieve identical results when the positive set is 
convex, but the non-convexity of these £p costs precludes the use of our randomized Ellipsoid 
method. The Ellipsoid method does provide an efficient solution for convex negative sets 
when p > 1 (since these costs are convex). However, for convex positive sets, our results 
show that for p > 1 there is no algorithm that can efficiently find an e-IMAC for all e > 0. 
Moreover, for p = 2 we prove that there is no efficient algorithm for finding an e-IMAC for 
any fixed value of e. 

By studying e-IMAC searchability, we provide a broader picture of how machine learning 
techniques are vulnerable to query-based evasion attacks. Exploring near-optimal evasion is 
important for understanding how an adversary may circumvent learners in security-sensitive 
settings. In such an environment, system developers are hesitant to trust procedures that 
may create vulnerabilities. The algorithms we demonstrate are invaluable tools not for 
an adversary to develop better attacks but rather for analysts to better understand the 
vulnerabilities of their filters. Our algorithms may not necessarily be easily used by an 
adversary since various real-world obstacles would first need to be overcome. Queries may 
only be partially observable or noisy and the feature set may only be partially known. 
Moreover, an adversary may not be able to query all x E Af; instead their queries must be 
legitimate objects (such as email) that are mapped into X. A real- world adversary must 
invert the feature-mapping — a generally difficult task. These limitations necessitate further 
research on the impact of partial observability and approximate querying on e-IMAC search, 
and to design more secure filters. Broader open problems include: is e-IMAC search possible 
on other classes of learners such as SVMs (linear in a large possibly infinite feature space)? 
Is e-IMAC search feasible against an online learner that adapts as it is queried? Can learners 
be made resilient to these threats and how does this impact learning performance? 
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Appendix A. Proof of Theorems for MultiLineSearch Algorithms 

To analyze the worst case of ii'-STEP MultiLineSearch (Algorithm [3]), we consider a 
malicious classifier that maximizes the number of queries. We refer to the agent that 
queries the classifier as the adversary. 

Proof of Theorem [5] At each each iteration of Algorithm IU the adversary choses some 
direction, e not yet eliminated from W. Every direction in W is feasible (i.e., could yield 
an e-IMAC) and the malicious classifier, by definition, will make this choice as costly as 
possible. During the K steps of binary search along this direction, regardless of which 
direction e is selected or how the malicious classifier responds, the candidate multiplicative 
gap (see Section [2. 2p along e will shrink by an exponent of 2~^; i.e., 

W = [ct) (10) 

log(G;+i) = log(Gi)-2-^ (11) 

The primary decision for the malicious classifier occurs when the adversary begins querying 
other directions beside e. At iteration t, the malicious classifier has 2 options: 

Case 1 (t S Ci): Respond with '+' for all remaining directions. Here the bounds 
candidates B~^ and B~ are verified and thus the new gap is reduced by an 
exponent of 2~^; however, no directions are eliminated from the search. 
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Case 2 (t G C2): Choose at least 1 direction to respond with ' — '. Here since 
only the value of C~ changes, the malicious classifier can chose to respond 
to the first K queries so that the gap decreases by a neglibile amount (by 
always responding with '+' during the first K queries along e, the gap only 
decreases by an exponent of (1 — 2~ )). However, the malicious classifier 
must chose some number Et>loi directions that will be eliminated. 



We conservatively assume that the gap only decreases for case 1, which decouples the 
analysis of the queries for Ci and C2 and allows us to upper bound the total number of 



queries made by the algorithm. By this assumption, if t G Ci we have Gt 



Gt 



whereas 



if t G C2, we have Gt 
it can be shown that 



Gt-i- By analyzing the gap before and after the final iteration T, 



|Ci| 



hi 

K 



(12) 



since, for the algorithm to terminate, there must be a total of at least L^ binary search 
steps made during the case 1 iterations and each case 1 iteration takes exactly K steps. 

At every case 1 iteration, the adversary make exactly K + |Wf | — 1 queries where Wi is 
the set of feasible directions remaining at the t iteration. While Wt is controlled by the 
malicious classifier, we can apply the bound \Wt\ ^ |VV|. Using this and the relation from 
Eq. (I12p . we can bound the number of queries Qi used in case 1 by 



Qi < ^(i^ + lWl-l) 



< 



L + K + 



•(K+lWl-1) 

•dWl-l) 
(|W|-1) . 



K 



L 
K 



L 
K 



For each case 2 iteration, we make exactly K+Et queries and this causes the elimination 
oi Et > 1 directions; hence, |Wi+i| = |Wt| — Et. A malicious classifier will always make 
Et = 1 whenever they use case 2 since that maximally limits how much the adversary 
gains. Nevertheless, since case 2 requires the elimination of at least 1 direction, we have 
IC2I < |W| — 1 and moreover, regardless of the choice of Et we have 'Yl,teC2 ^^ — 1^1 ~ ■*■ 
since each direction can be eliminated no more than once. Thus, 

Q2 = Y,{K + Et) 

ieC2 

< |C2|-K + |W|-1 

< (|W|-1)(K + 1) . 
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The total number of queries used by Algorithm U] 



Q = Qi + Q2 < L + K + 

= L + 
= L + 



L 
K 



(|W|-1) + (|W|-1)(K + 1) 



L 
K 



\W\ + K ■ \W\ + \W\ 
+ K + l]\W\ . 



L 
K 



Finally, choosing K = \VL] minimizes this expression and using L/\\/L] < \fTj and 
substituting K into Q's bound, we have 

Q<L+ (2\Vl] +i) \W\ . 



Appendix B. Proof of Lower Bounds 



Here we give proofs for the lower bound theorems in Section 13.1.21 first giving the proof 
for the more complictated multiplicative case followed by a similar proof sketch for the 
additive case. For these lower bounds, D is the dimension of the space, A : M^ — )• M^ is any 
positive convex function, < Cq < Cq are initial upper and lower bounds on the MAC, 
and J^convex, + (^ jrconvex, + jg ^j^g ^^^ q£ classifiers consistent with the constraints on the 
MAC; i.e., for / € p°^^'=^'+' we have Af+is convex, B'^o (A) C X+ , and B'^o (A) (/L Xf. 
Proof of Theorems [6] and [7] Suppose a query-based algorithm submits N < D ^ \ 
membership queries x^, . . . ,x^ G M^ to the classifier. For the algorithm to be e-optimal, 
these queries must constrain all consistent classifiers f'^'^^^'^^^ + to have a common point 
among their e-IMAC sets. Suppose that the responses to the queries are consistent with 
the classifier / defined as: 



/(x) 



+1 , if A (x) < Co" 
— 1 , otherwise 



(13) 



For this classifier, XT' is convex since ^4 is a convex function. Bo (^A) C XT' since Cq < Cq , 
and B'^o [A) (^ Xt since Xf is the open Cg'-ball whereas ^S'-^o (^) is the closed Cg'-ball. 
Moreover, since Xf is the open C(^-ball, ^ x G X7 s.t. A (x) < C^ therefore MAC (/, ^4) = 

Cq", and any e-optimal points x' G e-IMAC^*^ (/, A) must satisfy C^ < A (x') < (1 + e)Co". 
Similarly, any //-optimal points x' G r]-IMAC^~^' (/, A) must satisfy Cq < A (x') < Cq + rj. 
Consider an alternative classifier g that responds identically to / for x^ , . . . , x but has 
a different convex positive set X^. Without loss of generality, suppose the first M < N 
queries are positive and the remaining are negative. Let Q = conu (x^, . . . ,x^); that 
is, the convex hull of the M positive queries. Now let X^ be the convex hull of Q and 

the Co^-ball of A: X^ = conv (QuB'-^o (A)]. Since Q contains all positive queries and 
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Cf^ < Cq , the convex set Af+ is consistent with the observed responses, B^o (^A) C X^ by 
definition, and B^o (^A) (^ X^ since the positive queries are all inside the open Cj'-sublevel 
set. Further, since M<A^<Z) + l,^is contained in a proper linear subspace of 'WP and 
hence int (^) = 0. Hence, there is always some point from ;B o (A) that is on the boundary 
of Xj\ i.e., B'-^i [A) (/L int {Q) because int {Q) = and B'-^^ {A) / 0. Hence, there must 
be at least one point from i3 " i^A) on the boundary of the convex hull of i3 o i^A) and 
Q. Hence, MAC {g^A) = inf^g^^.- [^ (x)] = Cq. Since the accuracy e < -^ — 1, any 



X G e-IMAC^*"! {g, A) must have 



-^0 



Cn 



A{^)<{l + e)C^<^C^ = CQ 



C'o 



whereas any y E €-IMAC'^*^f , A) must have A{y) > Cq . Thus, €-IMAC'^*^f , A) D 
e-IMAC^*' {g,A) = and we have constructed two convex-inducing classifiers / and g 
both consistent with the query responses with no common e-IMAC^*' . Similarly, since 
rj <Cq' - C^, any x G rj-IMAC^^^ {g, A) must have 

A{^)<7^ + C+<Cq-C+ + C+ = Cq , 

whereas any y E r]-IMAC'^+'^ {f,A) must have ^ (y) > Cq . Thus, r]-IMAC^+^f , A) D 
rj-IMAC^^' {g, A) = and so the two convex-inducing classifiers / and g also have no 
common r]-IMAC^~^' . 

Suppose instead that a query-based algorithm submits N < Ll membership queries 
(or N < Lfj for the additive case). Recall our definitions: Cq is the initial upper bound 
on the MAC, Cq is the initial lower bound on the MAC, and Gi* = C^ /C^ is the gap 
between the upper bound and lower bound at iteration t {Gl = Cf — C^ for the additive 
case). Here, the malicious classifier / responds with 

/r<- . . n+ . 

(14) 



f (^n ^ 1 +1 ' if ^ (x*) < vcr-i • Cti 



-1 , otherwise 

When the classifier responds with '+', C^ increases to no more than \ C^_-y • C^_^ and so 
Gt > ^/Gt-l■ Similarly when this classifier responds with ' — ', Cf decreases to no less than 
\/C^_i ■ Cjti and so again Gt > ^/Ut^. Thus, these responses ensure that at each iteration 

Gt > \/Gt^i and since the algorithm can not terminate until Gn < 1 + e, we have N > Le 
from Eq. ([5]) (or in the additive case N > Llj from Eq.|3|). Again we have constructed two 
convex-inducing classifiers with consistent query responses but with no common e-IMAC. 
The first classifier's positive set is the smallest cost-ball enclosing all positive queries, while 
the second classifier's positive set is the largest cost-ball enclosing all positive queries but no 
negatives. The MAC values of these sets differ by more than a factor of (1 + e) if A^ < L^* 
(or, for the additive case, by a difference of more than r] if N < L\^ ), so they have no 
common e-IMAC. ■ 
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Appendix C. Proof of Theorem 

First we introduce the following lemma for the Z'-dimensional hypercube graphs — a collection 
of 2 nodes of the form (ibl,ibl,...,ibl) where each node has an edge to every other node 
that is Hamming distance 1 from it. 

Lemma 14 For any < 5 < 1/2, to cover a D-dimensional hypercube graph so that every 
vertex has a Hamming distance of at most \_SD\ to some vertex in the covering, the number 
of vertices in the covering must be 

where H (5) = — (51og2 5 — {1 — 5) log(l — 5) is the entropy of 5. 

Proof There are 2^ vertices in the D-dimensional hypercube graph. Each vertex in the 
covering is within a Hamming distance of at most h for exactly "^^=0 (fc) vertices. Thus, 
one needs at least 2^/ ( J2k=o ik)] ^° cover the hypercube graph. Now we apply the bound 

[SD] 



E (°) <- 2'""" 



k=0 

to the denominator, which is valid for any < 6 < 1/2. 



Lemma 15 The minimizer of the ip cost function Ap to any target x on the half space 
^w,b = {x I x^w > b^w} can be expressed in terms of the equilavent hyperplane x^'^w > d 

parameterized by a normal vector w and displacement d = (h — x ) w as 

{d- llwirp , ifd>0 
?^ (15) 

, otherwise 



for all 1 < p < oo and is 



for p = oo. 



d- llwlL ^ , ifd > , , 

U , otherwise 



Proof For 1 < p < oo, minimizing Ap on the halfspace Ti-w^b is equivalent to finding a 
minimizer for 



D 

i 

mm 



1 

- / \xi\^ s.t. X w < d . 



P 



Clearly, if d < then the vector (corresponding to x in the transformed space) trivially 
satisfies the constraint and minimizes the cost function with cost which yields the second 
case of Eq. (flS]) . For the case d > 0, we construct the Lagrangian 



1 ^ 

£ (x. A) = - N^ |xj|^ — A ( X w — d 
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Differentiating this with respect to x and setting that partial derivative equal to zero yields 

1 
X* = sign('Wj) {X\wi\)p'^ . 

Plugging this back into the Lagrangian yields 

1 - ^ 

£(x*,A) = -X^y^lwil^ +Xd , 

^ i=i 

which we now differentiate with respect to A and set the derivative equal to zero to yield 



D ,... ,^ 



Plugging this solution into the formula for x* yields the solution 

* ■ f ^( d V ,^ 

Xi =sign(u;i) — — \wi\p~^ . 

The ip cost of this optimal solution is given by 

Ap i^*) = d ■ \MZl_ , 
p-i 

which is the first case of Eq. (fT5]) . 

For p = oo, once again if d < then the vector trivially satisfies the constraint and 
minimizes the cost function with cost which yields the second case of Eq. (I16p . For the 
case d > 0, we use the geometry of hypercubes (the equi-cost balls of a ioo cost function) to 
derive the second case of Eq. (I16p . For any optimal solution must occur at a point where 
the hyperplane given by x^w = b^w is tangent to a hypercube about x — this can either 
occur along a side (face) of the hypercube or at a corner. However, if the plane is tangent 
along a side (face) it is also tangent at a corner of the hypercube. Hence, there is always 
an optimal solution at some corner of optimal cost hypercube. 

At a corner of the hypercube, we have the following property: 

Pil ~ l^2l — • • • — \x£)\ ; 

that is, the magnitude of all coordiates of this optimal solution is the same value. Further, 
the sign of the optimal solution's i coordinate must agree with the sign of the hyperplane's 
i coordinate, Wj. These constraints, along with the hyperplane constraint, lead to the 
following formula for an optimal solution: 

Xi = (i-sign('u;i)||vi^||];^ . 

The ioo cost of these solutions is simply 

d ■ ||w||j^ . 
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For the proof of Theorem 1121 we use the orthants (centered at x ) — an orthant is the 
D-diniensional generahzation of a quadrant in 2-diniensions. There are 2 orthants in a 
D-diniensional space. We represent each orthant by it's canonical representation which is a 
vector of D positive or negative ones; i.e.,, the orthant represented by a = (±1, ±1, . . . , ±1) 
contains the point x + a and is the set of all points x satisfying: 

f[0,+ool, ifa = +l 
Xj G < 

[[-00,0], ifa=-l 

Proof of Theorem 1121 Suppose a query-based algorithm submits N membership queries 
x^, . . . ,x G M to the classifier. Again, for the algorithm to be e-optimal, these queries 
must constrain all consistent classifiers 7^^=°°™^' + to have a common point among their 
e-IMAC sets. The responses described above are consistent with the classifier / defined as 

/(x) = <' -' if^pW<^o" . (17) 

otherwise 

For this classifier, Xf is convex since Ap is a convex function for p > 1, Bo [Ap) C Xf 
since Cq < Cq , and B'-'o [Ap) (^ Xf since Xf is the open C(^-ball whereas ^B'^o (^p) is 
the closed Cj'-ball. Moreover, since Xt is the open C,^-ball, ^ x G XV s.t. Ap (x) < Cg" 

therefore MAC (/, Ap) = Cq , and any e-optimal points x' G e-IMAC^*' (/, Ap) must satisfy 
C^<Ap{^')<{l + e)C^. 

Now consider an alternative classifier g that responds identically to / for x-*^ , . . . , x^ 
but has a different convex positive set X^ . Without loss of generality suppose the first 
M < N queries are positive and the remaining are negative. Here we consider a set which 
is a convex hull of the orthants of all M positive queries; that is, 

g = conv (orth (x^) n Af+, orth (x^) n Af+, . . . , orth (x*^) n X^ 

where orth (x) is some orthant that x lies with in relative to x (a data point may lie within 
more than one orthant but we need only select any orthant that contains it in order to cover 
it). By intersecting each data point's orthant with the set Xt and taking the convex hull of 

these regions, Q is convex , contains x and is a subset of Xf that is also consistent with all 
the query responses of/; i.e.,, each of the M positive queries are in X^ and all the negative 
queries are in X~ . Moreover, ^ is a superset of the convex hull of the M positive queries. 
Thus, by finding the largest enclosed ip ball within the Q, we upper bound MAC (g, Ap). 

We now represent each orthant as a vertex in a D-dimensional hypercube graph — the 
Hamming distance between any pair of orthants is the number of different coordinates 
in their canonical representations and two orthants are adjacent in the graph if and only 
if they have Hamming distance of 1. Using this notion of Hamming distance, we will 
seek a X-covering of the hypercube. We refer to the orthants used in Q to cover the M 
positive queries as covering orthants and their corresponding vertices form a covering of the 
hypercube. Suppose the M covering orthants are sufficient for a K covering but not K — 1 
covering; then there must be at least one vertex not in the covering that has at least a K 
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Hamming distance to every vertex in the covering. This vertex corresponds to an empty 
orthant that differs from ah covered orthants in at least K coordinates of their canonical 
vertices. Without loss of generality, suppose this uncovered orthant has the canonical vertex 
of all postitive ones which we scale to C(^(+l, +1, . . . , +1). Consider the hyperplane with 
normal vector w = (+l,+l,...,+l) and displacement 

^ ^ [CQiD-K)"^ ifl<p<oo 
\C^{D-K) iip = oo 

that specifies the function s (x) = x^w — d = X^j^^ Xj — d. For this hyperplane, the vertex 
Co-(+l, +!,...,+!) yields 

s(Co"(+l,+l,...,+l)) =CQD~d>0. 

Also for any orthant a with Hamming distance at least K from this uncovered orthant, we 
have that for any x G orth (a) n XT' , by definition of the orthant and XT' , the function s 
yields 

D 

s (x) = y^ Xi — d 

= ^ ^* + X] Xi -d . 

{i I ai=+l} >o {i I ai=-l} <o 

Since all the terms in the second summation are non-postive, the second sum is at most 
0. Further, by maximizing the first summation, we upper bound s (x). The summation 
Ylu I a-=+i> -^i (with the constraint that ||x||p < Cq) has at most D — K terms and is 

maximized by Xi = Cq{D — K)~^'^ (or Xi = Cq for p = oo) for which the first summation 

_ p^ _ 

is upper bounded by Cq {D — K) p or Cq {D — K) for p = oo; i.e., it is upper bounded by 

d. Thus we see that 

s (x) < . 

Thus, this hyperplane seperates the scaled vertex Cq (+1, +1, . . . , +1) from each set orth (a)n 
Xf where a is the canonical representation of any orthant with a Hamming distance of at 
least K. Thus, this hyperplane also seperates the scaled vertex from Q by the properties 
of the convex hull. Since the displacement Cq{D — K) > 0, by applying Lemma [T5l this 
separating hyperplane upper bounds the cost of the largest £p ball enclosed in Q as 

MAC ig,Ap) < Cq{D - K)— ■ \\w\\-J_ = Cq ( — ^— J 
for 1 < p < oo and 



MAC (g, Ap) < Cq{D - K) ■ \\1\\^^ = Cq 



D-K 



D 
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for p = oo. Since we have an upper bound on the MAC of g and the MAC of / is Cq , in 
order to have a common e-IMAC between these classifiers, we must have 



(l + e)> < 



^) ^ , if 1< P < oo 
^ if p = oo 



I. D-K ' 

Solving for the value of K required to achieve a desired accuracy of 1 + e we have 

^,,^i±^%il^, ifl<p<oo 



1+e 



D , if p = oo 



which bounds the size of the covering required to achieve the desired accuracy. 
For the case 1 < p < oo, by Lemma [HI there must be 



V 



M > exp J ln(2) • D ( 1 - // ( ^^^^^^ \ ^ 

I V V (1 + e)— , 



p-1 



vertices of the hypercube in the covering to achieve any desired accuracy 0<e<2p — 1, 
for which 

(1 + e)^ -1 1 

(l + e)p-i ^ 

as required by the Lemma. Moreover, since Q < H [6) <\ for any < (5 < 1, 

and we have 

M >a^, . 

Similarly for p = oo, Lemma O can be applied yielding 

to achieve any desired accuracy < e < 1 (for which e/(l + e) < 1/2 as required by the 
Lemma). Again, by the properties of entropy the constant aoo,e = 2*^ ~ \t+^)) > l for 
< e < 1 and we have 

M>a^^ . 
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(a) 



(b) 



Figure 4: This figure depictions the geometry of spherical caps, (a) A spherical cap of 
height h is shown that is created by a plane passing through the sphere. The 
green region represents the area of the cap. (b) We see the geometry of the 
spherical cap. Notice that the intersecting hyperplane forms a right triangle with 
the centroid of the hypershere. The length of the first side of that triangle is 
R— h, it's hypotenuse is length R, and its other side is length y^h{2R — h). The 
half angle (j) of the right circular cone can also be used to parameterize the cap. 



Appendix D. Proof of Theorem 1131 

For this proof, we build on previous results for covering hyperspheres. The proof is based on 
the following covering number result by Wyner and Shannon which bounds the minimum 
number of spherical caps required to cover a hypersphere. A Z)-dimensional spherical cap 
is the region formed by the intersection of a halfspace and a hypersphere facing away from 
the center of the hypersphere as depicted in Figure [H This cap is parameterized by the 
hypersphere's radius R and the half-angle about a central radius (through the peak of 
the cap) as in the right-most diagram of Figure HI 

Based on these formula, we now derive a bound on the numbe r of spherical caps of 
half-angle (f) required to cover the sphere, mirroring the result due to IWvneiJ (j 19651 ). 



Lemma 16 (Result based on Wyner 1965) Covering the surface of D-dimensional 
hypersphere of radius R requires at least 



1 



smc 



D-2 



spherical caps of half-angle (j). 

Proof In Capabilities of Bounded Discrepancy Decoding, Wyner showed that the minimal 
number, M, of spherical caps of half-angle (j) required to cover D-dimensional hypersphere 
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of radius R is given by 



M> 



D^T 



'D+r 



{D-1)T(1 + 



D^ 



sm 



D-2 



{t)dt 



This result follows directly from computing the surface area of the hypersphere and the 
spherical caps. 

We continue by lower bounding the above integral for a looser but more interpretable 
bound. Integrals of the form L sin (t)dt also arise in computing the volume of a spherical 
cap. This volur ne (a r id thu s the integral) can be bounded by enclosing the cap within a 
hypersphere; cf. iBalll ( 19971 ). This yields the following bound: 



Jo ^^ - r 1 



'D+r 



+ #) 



sm 



D 



Using this bound on the integral, our bound on the size of the covering is 
M > 



(D 



i)r(i + f) 



r(#) 



• sm 



D-2 



Now using properties of the gamma function, it can be shown that „/ D\-p/D-i 
SO that after canceling terms we arrive at our result: 



D-l 
D 



M> 



sm( 



D-2 



Proof of Theorem 1131 Suppose a query-based algorithm submits A^ < D -|- 1 membership 
queries x^, . . . ,x^ G M^ to the classifier. For the algorithm to be e-optimal, these queries 
must constrain all consistent classifiers J^^onvex, 4- ^^ have a common point among their 
e-IMAC sets. Suppose that all the responses are consistent with the classifier / defined as 



/(x) 



+ 1, if^2(x)<C, 

— 1 , otherwise 



(18) 



For this classifier, XT' is convex since A2 is a convex function, Bo [A2) C XT' since Cq < 
Cq , and B'^o (^2) ^ Xt since Xj^ is the open CQ-hsdl whereas B'^o [A2) is the closed 
C|^-ball. Moreover, since Xt is the open Cj~-ball, ^ x G XT s.t. A2 (x) < C^ therefore 
MAC {f,A2) = Cq, and any e-optimal points x' G e-IMAC^*^ {f,M) must satisfy Cq < 
^2(x')<(l + e)Co-. 



Now consider an alternative classifier g that responds identically to / for x 



,x 



TV 



but has a different convex positive set Af+. Without loss of generality suppose the first 



M < N queries are positive and the remaining are negative. Let Q = conv (x 



,x 



M 



); 
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that is, the convex hull of the M positive queries. We will assume x G Q since if it 
is not, then we constuct the set X^ as in the proof for Theorems [7] and [6] above and 
achieve MAC {f , A2) = Cq thereby showing our desired result. Now consider the points 
z' = Cq j^^,i. ] i.e.,, the projection of each of the positive queries onto the surface of the £2 

ball Bo (^2)- Since each positive query lies along the line between x and its projection z*, 
by convexity and the fact that x G ^, we have Q C conv (z^, z^, . . . , z^) . We will call this 
enlarged hull G. These M projected points {z*} must form a covering of the Cj~-hypersphere 
as the locii of caps of half-angle (p* = arccos ((1 + e)~^) . If not, then there exists some point 
on the surface of this hypersphere that is at least an angle (p* from all z* points and the 
resulting (j)*-cap centered at this uncovered point is not in Q (since a cap is defined as the 
intersection of the hypersphere and a halfspace). Moreover, by definition of the (/>*-cap, it 
achieves a minimal ^2 cost of C^ cos(j)*. Thus, if we fail to achieve a (/)*-covering of the 
Cj'-hypersphere, the alternative classifier g has MAC {g, A2) < Cq coscf)* = Cq /{I + e) 
and any x E e-IMAC^*' {g, A2) must have 

Co 



A2 (x) < (1 + €)MAC < (1 + e)^ = C7- 



+ e 

whereas any y e e-IMAC^*^ (/, A) must have A (y) > Cq . Thus, we would have e-IMAC'-*'^ (/, ^)n 
e-IMAC^*' {g,A) = and thus fail to achieve e-multiplicative optimality. Thus, we have 
shown that an </)*-covering is necessary for e-multiplicative optimality. However, from 
Lemma [T6l to have a (/)*-covering we must have 

. xD-2 

M> 

ysinc; 

Using the trigonometric identity sin (arccos (x)) = yl — x^ we can substitute for (p* and 
find 



D~2 



M > ' ^ 



sm arccos 



> ( "+^)' 



1 
2 
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